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Abstract — In this work we present a quadratic programming 
approximation of the Semi-Supervised Support Vector Machine 
(S3VM) problem, namely approximate QP-S3VM, that can be 
efficiently solved using off the shelf optimization packages. We 
prove that this approximate formulation establishes a relation 
between the low density separation and the graph-based models 
of semi-supervised learning (SSL) which is important to develop 
a unifying framework for semi-supervised learning methods. 
Furthermore, we propose the novel idea of representing SSL 
problems as submodular set functions and use efficient sub- 
modular optimization algorithms to solve them. Using this new 
idea we develop a representation of the approximate QP-S3VM 
as a maximization of a submodular set function which makes 
it possible to optimize using efficient greedy algorithms. We 
demonstrate that the proposed methods are accurate and provide 
signilicant improvement in time complexity over the state of the 
art in the literature. 

L Introduction 

The recent advances in information technology imposes 
serious challenges on traditional machine learning algorithms 
where classification models are trained using labeled samples. 
Data collection and storage nowadays has never been eas- 
ier and therefore using such enormous volumes of data to 
infer reliable classification models is of utmost importance. 
Meanwhile, labeling entire data sets to train classification 
models is no longer a valid option due to the high cost of 
experienced human annotators. Despite the recent efforts to 
make annotation of large data sets cheap and reliable by using 
online workforce, the collected labeled data can never keep up 
with the cheap collection of unlabeled data. 

Semi-supervised learning (SSL) handles this issue by uti- 
lizing large amount of unlabeled samples, along with labeled 
samples to build better performing classifiers. Two assump- 
tions form the basis for the usefulness of unlabeled samples 
in discriminative SSL methods: the cluster assumptions and 
the smoothness assumption |[T1. Although both assumptions 
use the idea that samples that are close under some distance 
metric should assume the same label, they inspire different 
categories of SSL algorithms, namely low density separation 
methods (for the cluster assumption) and graph-based methods 
(for the smoothness assumption). In the low density separation 
methods the unlabeled samples are used to better estimate the 
boundaries or each class. The graph-based methods use labeled 
and unlabeled samples to construct a graph representation of 
the data set where information is then propagated from the 
labeled samples to the unlabeled samples through the dense 



regions of the graph, a process known as label propagation 
|2|. 

The practical success and the theoretical robustness of large 
margin methods in general and specially Support Vector Ma- 
chines (SVM) has drawn a lot of attention to Semi-Supervised 
Support Vector Machines (S'^VM) |3 1. However the problem is 
challenging due to the non-convexity of the objective function. 
In this paper we propose an approximate-S'^VM formulation 
that will result in a standard quadratic programming problem, 
namely approximate QP-S"^VM, that can be solved directly 
using off the shelf optimization packages. One important 
aspect of the proposed formulation is that it uncovers a 
connection between the S'^VM, as a low density separation 
method, and the graph based algorithms which is a helpful 
step towards a unifying framework for SSL |4J. Furthermore, 
we present a new formulation of loss based SSL problems. 
The new formulation represents SSL problems as set functions 
and use the theory of submodular set functions optimization to 
solve them efficiently. Specifically, we present a submodular 
set function that is equivalent to the proposed approximate QP- 
S'^VM and solve it efficiently using a greedy approach that is 
well established in optimizing submodular functions |J5|. 

provides preliminaries of S'^VM and the nota- 
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tions used throughout the paper The proposed approximate 
QP-S'^VM is detailed in Section |ll] In Section III we present 
the submodular formulation of the approximate QP-S'^VM. 



Experimental results are provided in Section IV followed by 
the conclusion in Section IVl 

A. Preliminaries 

Semi-supervised learning uses partially labeled data sets £U 
U where C — {(xi,yi)} and U — {xj}, x g R", and yi e 
{+1,-1}. Throughout this paper we use i and j as indices 
for labeled and unlabeled samples, respectively. 

The major body of work on S'^VM is based on the idea 
of solving a standard SVM while treating unknown labels as 
additional variables |3 1. The semi-supervised learning problem 
is to find the solution of 
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inj{-w,yj)= -||w|p+C^£;(w,(x,,y,)) 

^eC (J) 

+ C*^4(w,x,) 



where the loss functions for unlabeled samples i^ and labeled 
samples £i are defined as follows: 



4(w, (xj,2/j)) = max {0, 1 - %((w,Xj) + 6)} (2) 
we{-i.+i} 



binatorial S'^VM problem. 
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fj(w,(x„y,))=™aa;{0,l-2/»((w,x,)+&)} (3) subject to 



The solution of Eqn.([T]i will result in finding the optimal 
separating hyperplane w and the labels assigned to the unla- 
beled samples yj . The loss over labeled and unlabeled samples 
is controlled by two parameters C and C* , which reflect 
the confidence in the labels j/^ and the cluster assumption, 
respectively. 

Algorithms that solve Eqn.Q can broadly be divided into 
combinatorial and continuous optimization algorithms. In con- 
tinuous optimization algorithms, for a given fixed w, the opti- 
mal Hj are simply obtained by sgn{{w, Xj) +b). The problem 
then comes down to a continuous optimization problem in w. 
On the other hand, in combinatorial optimization algorithms, 
for given yj, the optimization for w is a standard SVM 
problem. Therefore, if we define a function I{yj) such that 
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Now that the problem has been simplified from being 
combinatorial in yi, yj E {-1-1,— 1}, to being continuous in 
Pj, Pj G [0, 1], we proceed to find the dual form. Deriving the 
Lagrangian of the continuous formulation in Problem [T] and 
applying the Karush-Kuhn-Tucker conditions to it, the obtained 
dual form is presented in Problem [2] 



Problem 2. Dual form of min J'{'w,P) in ProblemUi 



J{yj) ^minj{w,yj) 



(4) 



max Xouai 
A,B,r 



(7) 



where 



the problem will be transformed to minimizing I{yj) over a 
set of binary variables where each evaluation of I{yj) is a 
standard SVM optimization problem [6J, [TJ, i8J, 



m,in I{yj ) 



(5) 



Solving Eqn.Q may lead to degenerate solutions where all the 
unlabeled samples are assigned to one class. This is usually 
handled in the literature by enforcing a balancing constraint 
which makes sure that a certain ratio r of the unlabeled 
samples are assigned to class +\ [3J. 



II. Quadratic Programming Approximation of 
S^VM (QP-S^VM) 

In Eqn.(|5]l the combinatorial formulation of S'^VM opti- 
mizes for the labels yj that minimize the loss associated with 
each unlabeled sample. To overcome the hard combinatorial 
problem, the loss of setting yj = 1, denoted by t^ , is assigned 
a new variable pj, where Q < Pj < 1. This variable indicates 
the probability that the yj = 1 is correct. Similarly, the 
loss of setting yj = —1, denoted by £~ , is given by the 
probability 1— Pj. The balancing constraint will have the form 
J^ieuPj ~ '^1^1- This modified formulation has the following 
form fl, Q: 

Problem 1. Continuous optimization formulation of the com- 



iDuai = A'l|£| + (r + ^)'l\u\ - 2 (A ° Y)'K„(A o Y) 

-i(r - B)'Ku„(r - B) - (A o Y)'Kiu(r - b) 

(8) 
subject to < A < Cl|£| 

o< r < c*p 

0<B<C*(l|z^|-P) 

where 

1|£|." A ones vector of length \C\. Similarly is '^\u\- 

ai.'Lagrangian Multiplier of labeled loss constraint Q. 

'Yj.-Lagrangian Multiplier of unlabeled loss constraint tj . 

j3j: Lagrangian Multiplier of unlabeled loss constraint ij . 

A' = [ai,...,a|£|],B' = [/3i, . . . ,/3|i^|],r' == [71, . . . ,7|z^|] 
P' = [pi,...,piu\], Y' = [yi,...,y\c\], 
K„ = Ku' Vi, i' e £, K„„ = K,- ,, Vj, j' e U, 
Kiu = K,j \/ieC,j eU. 

Using the derived dual form in Problem l2] we propose 
an approximate optimization based on minimizing an upper 
bound of rnaxA.B.r^^Duai- The proposed upper bound is 
specified in the following theorem. 

Theorem 1. Proposed upper bound for JTiaxA.B.r^^Duai-' 

max It,^^i<I{^*)+C*\U\+Mx+M2. (Q\ 



where Z(w*) =min -||w|| 
w 2 






1 9 , (10) 

Xi = 2^ (iM-P)KuuP 

X2 = CC*Y'Kl„(l|;^| - P) 

Proof: See the appendix. ■ 

Examining the upper bound in Theorem [T] Z(w* ) is the 
objective function value of optimizing a standard supervised 
SVM on the labeled samples C. Therefore, it is constant as 
well as the term C*|Z-/|. The rest of the upper bound, Aii + 
M.-2, is a function of P. The optimal values of P are now 
obtainable through the following optimization problem. 

Problem 3. Quadratic programming approximation of Semi- 
supervised Support Vector Machines (QP-S'^VM): 
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min -C*'{liui - P)'KuuP + CC*Y'K,,(1 
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(11) 

(12) 



Hwi = 'I'-'h iJ ri -t- r^ -i-iwi- 

Note: Equation ( |ll[ l can be rewritten in the standard quadratic 
programming form as follows: 

(^C*'l|;^|Kuu-CC*Y'Kiu)P 

(13) 
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The proposed approximate formulation is a quadratic pro- 
gramming problem in the variables pj. In order to avoid trivial 
solutions to the problem where all the variables pj are zero. 
We add the constraint P'l = r\U\ which makes sure that a 
certain ratio of the unlabeled samples, r, be assigned to class 
+1. 

A. QP-S^VM Model Interpretation 

In this section we analyze the approximate model obtained 
in Problem l3] This is necessary to ensure that the approximate 
model does not deviate from the original S^VM problem. The 
first term in Eqn.([TT|) can be expanded as follows: 

^C*'(1|^|-P)'K„„P 



-C*^ Y^ [Kuu]jj'Pi'(l-Pj) 



j,j' = {h...,\U\} 



(14) 
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E [Kuu] J J' (Pj +Pr - '2pjPj' ) 



i={l,...,|WKl} 
j' = {j + l....,\U\} 



As Qi is negative quadratic in pj, minimizing Qi enforces the 
values of pj to be either or 1. In other words, minimizing Qi 
help making clear assignments of the labels to the unlabeled 
samples. To understand the implications of minimizing Q2 
on the solution of Problem [3] we will start by plotting z = 
(Pj + Pj' ~ '^PjPj')^ for all Pj,Pj' e [0, 1], as shown in FigjT] 








Fig. 1. Plot of z = {pj +Pji — 2pjPji) for all Pj,Pj' £ [0, 1]. 

In Fig[T] we see that small values of z, i.e. z ~ 0, means 
that Qj ~ Qj' while large values of z, i.e. z ~ 1, means that 
QjQji ~ 0. To minimize Q2 we assign small z to large valued 
[Kuulj.j'- This means that when two unlabeled samples Xj 
and Xjv are close, [Kuu]j j' is large, the assigned small valued 
z will force them to assume the same label, i.e. qj ~ qji. 
On the other hand, if [Kuuljj' is small, we assign a large 
z to it. In other words, if the two unlabeled samples are 
not close, small [Kuu]jjs then they should be assigned to 
different classes, by setting z to be large, i.e. Qjqj' — 0. It is 
easy to see now how minimizing Q2 basically implements the 
clustering assumption of semi-supervised learning algorithms 
where unlabeled samples form clusters and all samples in 
the same cluster have the same label. Notice that during the 
minimization of Q2 a smaller minimum value is achievable 
if all the unlabeled samples are assigned the same label, that 
is when z = and therefore pj ~ pji. However, this is a 
degenerate solution and this is why the balancing constraint is 
important in the approximate formulation in Problem [3] 

Next we study the second term in Eqn.([TT|). We start by 
rewriting it as follows: 

CC*Y'Kiu(l|i,| -P) = CC*^2/,[KiJ,,,(l-p,) 

iec,jeu 

=CC* 5][KiJ,,,(l -p,) + CC* 5][K,J,,,(p, - 1) 
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(15) 
We split Eqn ( [T5| ) into terms associated with labeled samples 
with Tji = +1, Q3, and those with yi — —1, Q4. This 
is necessary because of the dependence of the interpretation 
on the labels yi. Since pj e [0,1], minimizing Q3 involves 
assigning small (1 — pj), i.e. pj ~ 1, to [Kiu]ij with large 
values and vice versa, small valued [Kiuji.j are assigned 
large (1 — pj), i.e. pj ~ 0. In other words, if an unlabeled 
sample Xj that is close to, i.e. large [Kiujij, a labeled sample 
{x-i^yi — +1), then this unlabeled sample should have the 
same label as the labeled sample, that is pj ~ 1 and yj — +1. 
On the other hand, if the unlabeled sample Xj is far from, 
i.e. small [Kiuji.j, the labeled sample {xi,yi = +1), then 



this unlabeled sample should have a opposite label to that 
of the labeled sample, that is pj ~ and yj = —1. Once 
again it is notable that if the balancing constraint is not used, 
a smaller value for the minimum of Q3 is achievable if all 
the unlabeled samples are assigned the same label, pj ~ 1 
and Uj — +1. The same argument holds for minimizing 
Q4 where unlabeled samples with large/small similarity to a 
labeled sample {x-i^yi — —1) will be assigned small/large 
{pj — 1), i.e. Pj ~ and pj ~ 1, respectively. 

The process of jointly minimizing Q2, which implements 
the clustering assumption of semi-supervised learning, and 
Q3 + Qi, where unlabeled samples are assigned labels by 
their similarity to labeled samples, results in a formulation 
that follows the same intuition behind label propagation algo- 
rithms ||2| for semi-supervised learning. That is the labeling 
process chooses dense regions to propagate labels through 
the unlabeled samples. Therefore, the provided approximate 
formulation in Problem |3] does not deviate from the general 
paradigm of the semi-supervised learning problem. Meanwhile 
the provided formulation provides an insight into the con- 
nection between the Avoiding Dense Regions semi-supervised 
algorithms, which include S'^VM, and the Graph-based algo- 
rithms. 

III. SUBMODULAR OPTIMIZATION OF APPROXIMATE 

QP-S^VM 

The approximate QP-S'^VM formulation proposed in Prob- 
lemlSlis simple and intuitive. However, due to the fact that it is 
a quadratic minimization of a concave function, the computa- 
tional complexity of finding a solution will become a hindering 
issue specially for semi-supervised learrung problems which 
are inherently large scale. In this section we use the concepts 
of submodular set functions to provide a simple and efficient 
algorithm for the proposed approximate QP-S'^VM problem. 

Submodular set functions play a central role in combina- 
torial optimization |10|. They are considered discrete analog 
of convex functions in continuous optimization in the sense of 
structural properties that can be benefited from algorithmic ally. 
They also emerge as a natural structural form in classic 
combinatorial problems such as maximum coverage and max- 
imum facility location in location analysis, as well as max- 
cut problems in graphs. More recently submodular set func- 
tions have become key concepts in machine learning where 
problems such as feature selection fTT) and active learning 
lfT2J are solved by maximizing submodular set functions while 
other core problems like clustering and learning structures 
of graphical models have been formulated as submodular set 
function minimization ifTSJI . 

As discussed in Section III] the solution of the approximate 
QP-S'^VM provides a value for the variable pj associated 
with each unlabeled sample Xj ,jEU such that pj — 1 for 
yj — +1 and pj = for yj = —1. In this section we use a 
different perspective of the problem. In this new perspective 
the problem of binary semi-supervised classification in general 
is concerned with choosing a subset A from the pool of all 
unlabeled samples U. All the unlabeled samples Xj, j e A 



should be assigned the label yj = +1 and the rest of them, 
Xj,j e U\A, will be assigned the label yj ~ —1. Each 
possible subset A is assigned a value by a set function 
f{A) that has the same optimal solution, in terms of A and 
U\A, as the original semi-supervised classification problem. 
What makes the reformulation of semi-supervised learning 
into a set functions interesting is that if the set function 
f{A) is monotonic submodular, many algorithms can solve 
the problem efficiently |10|. In the following we give some 
background on the concept of submodularity in set functions 
and how we employ it to solve our problem efficiently. 

Let /(X) be a set function defined of the set X = 
{xi,X2, . . . ,x„}. The monotonicity and submodularity of 
/(X) are defined as follows ifTOl : 

Definition 1. For all sets A,B C X. with A <Z B, a set 
function / : 2-^ — >■ M is: 
a) Monotonic if 



f{A) < f{B) 



b) Submodular if 



f{A U {x,}) - f{A) > f{B U {x,}) - f{B) 

for all Xj ^ B. 

A well acknowledged result by Nemhauser et al. 0, see 
Theorem |2] below, establishes a lower bound of the perfor- 
mance for the simple greedy algorithm, see Algorithm [T] if 
it is used to maximize a monotone submodular set function 
subject to a cardinality constraint. The simple greedy algo- 
rithms basically works by adding the element that maximally 
increases the objective value and according to Theorem |2] this 
simple procedure is guaranteed to achieve at least a constant 
fraction (1— 1/e) of the optimal solution, where e is the natural 
exponential. 

Tlieorem 2. Given a finite set X = {xi,X2, . . . ,x„} and 
a monotonic submodular function f{A), where y^ C X and 
/(0) = 0. For the following maximization problem, 

A* = argmax f{A). 
\A\<k 

The greedy maximization algorithm returns Acreedy such that 

f{AGreedy)>{l~l)f{A*). 



Algoritlim 1 :Greedy Algorithm for Submodular Function 
Maximization with Cardinality Constraint jSl, lfT4ll 

1. Start with Xq = (/i 

2. For i ~ 1 to k 

x* := argmaX:^ /(Xj_i U {x}) -/ (Xi_i) 
X, :-X,_iU{x*} 



A. Solving QP-S^VM Using Submodular Optimization 

In this section we use the concepts of submodular functions 
maximization to provide an efficient and simple algorithm for 



solving the approximate QP-S'^VM problem. Towards this goal 
we propose the following submodular maximization problem 
that is equivalent to the approximate QP-S'^VM in Problem p] 

Problem 4. Submodular maximization formulation that is 
equivalent to Problem pj 

S{A) (16) 



max 



where 



S{A) = - -C 
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[Kiu] 



«j 



C*' 



j.j'eA 



\K,., 



Jjj' 



jJ'eA'- 



-C*^\U\ + CC*\C\ 



2^ 



(17) 
where S is a submodular set function defined on all subsets 
A <ZlA of unlabeled samples assigned to the class tjj = +1, 
< Kij < d, and Sjji = 1 for j ~ j' and otherwise. 

Problem |4] basically maximizes the negative of a discrete 
version of the objective function in Eqn.lpj]). The correspon- 
dence between the first three terms in S{A) and Eqn.jfT?! is 
straightforward. However, the term Q5 is of our design and 
it is added to ensure the monotonicity and submodularity of 
S{A), as shown in Theoremis] The constant d is the maximum 
value of the kernel matrix. Therefore d = 1 for Radial Basis 
Function (RBF) kernels. If the data is feature-wise normalized, 
a highly recommended practice, with values e [0, 1], then for 
the linear kernel d is equal to the number of dimensions of 
the used data set (for dense data) or the average number of 
non-zero features (for sparse data). Since for a fixed \A\ the 
value of Q5 is constant, then the optimal solution obtained by 
optimizing S{A) is not affected by adding Q5. In other words 
Q5 depends on the cardinality of A not its contents. 

Theorem 3. The set function S{A) in ProblemWjis monotone 
(non-decreasing), submodular, and 5(0) = 0. 

Proof: See the appendix. 

■ 

Now that we have shown that S{A) is monotonic, submod- 
ular, and 5(0) ~ this means that the greedy maximization 
algorithm can used be used to optimize Problem HI and the 
performance guarantee in Theorem |2] holds true. 

To summarize, the proposed equivalent submodular max- 
imization in Problem |4] is defined on the all subsets A of 
samples belonging to the class labeled yj = +1. The efficient 
greedy algorithm in Algorithm [T] is used to the solve the prob- 
lem efficiently. Once the optimum solution A* is determined, 
the rest of the unlabeled samples, i.e. U\A*, will belong to 
class with labels i/j = —1. We use the proposed algorithm in 
the transductive setting of semi-supervised learning. However, 
if the inductive setting is needed, a standard supervised SVM 
training can be performed to give the final hyperplane w. 



IV. Experimental Results 

In this section we illustrate the accuracy and efficiency 
of the proposed QP-S'^VM and its submodular optimization 
(S-QP-S'^VM). To this end, we compare the performance of 
QP-S^VM and S-QP-S'^VM with three competitive S^VM 
algorithms, namely the Transductive Support Vector Machine 
(TSVM) |7|, the Deterministic Annealing for Semi-supervised 
Kernel Machines (DA) |8 1, and yTSVM 1 15 1. All experiments 
are performed on a 2 GHZ Intel Core2 Duo machine with 2 GB 
RAM. The experiments are performed on several real world 
data, see Table llj that are selected so as to achieve diversity 
in terms of dimensionality and distribution properties. 

TABLE I 
Data sets used in the experiments (16), fTTl . 



Data set 



Features Samples Labeled 



C 



C*/C 



australian 


14 


690 


3 


0.922 


10-1 


0.44 


w6a 


300 


1,900 


19 


0.838 


10-4 


0.5 


svmguidel 


4 


3,089 


15 


1.055 


10-3 


0.65 


a9a 


123 


15,680 


78 


0.897 


10-3 


0.5 


news20.binary 


1,355,191 


19,900 


100 


6.087 


10-3 


0.5 


real-sim 


20,958 


72,309 


8 


1 


10-4 


0.31 


KDD-99 


122 


10^ 


10 


1 


10-4 


0.56 



In the accuracy of transductive learning experiment we 
considered a challenging setup where the number of labeled 
samples does not exceed 1% of the available unlabeled data 
and in two data sets the percentage is as low as 0.01%. 
The labeled/unlabeled samples splitting process is repeated 10 
times and the average is reported in Table III] To illustrate 
the value of using unlabeled samples in the semi-supervised 
setting the results of standard SVM trained using only the 
labeled samples are presented. All experiments use the linear 
kernel with feature-wise normalized data. The ratio of positive 
samples in the output r is set to the correct ratio in the 
unlabeled samples. It is clear in Table [11] that the QP-S^VM 
and S-QP-S'^VM are superior in terms of accuracy to TSVM, 
DA, and yTSVM. 



In Table III we provide a CPU-time comparison between 
the QP-S^VM, S-QP-S^VM, TSVM, DA, and yTSVM. It is 
clear that from the time complexity perspective, S-QP-S'^VM 
is far more efficient than its competitors. 

TABLE 111 

CPU TIME (Seconds) experiments. 



Data set 


TSVM 


DA 


VTSVM 


QP-S^VM 


S-QP-S3VM 


australian 


11.73 


0.786 


0.452 


174.82 


0.013 


w6a 


109.40 


0.836 


2.491 


6,993.12 


0.038 


svmguidel 


186.59 


2.46 


0.803 


- 


0.008 


a9a 


206.30 


20.78 


18.68 


- 


0.335 


news20.binary 


- 


653.4 


- 


- 


3.241 


real-sim 


- 


89.38 


- 


- 


1.925 


KDD-99 


- 


2,740 


- 


- 


1,620 



V. Conclusion And Future Work 

In this paper we propose a quadratic programming approxi- 
mation of the semi-supervised SVM problem (QP-S"^VM) that 



TABLE II 

Classification accuracy experiments for medium size data sets. 



Data set 


SVM 


TSVM 


DA 


VTSVM 


QP-S^VM 


S-QP-S^VM 


australian 


50.029 


63.26 


60.48 


56.53 


75.57 


74.49 


w6a 


67.44 


58.73 


68.09 


52.60 


72.33 


70.75 


svmguidel 


71.19 


77.31 


80.98 


69.71 


92.73 


92.45 


a9a 


66.91 


71.49 


72.91 


64.43 


- 


74.90 


news20.binary 


63.35 


- 


67.94 


- 


- 


71.44 


real-sim 


52.13 


- 


69.23 


- 


- 


71.83 


KDD-99 


72.12 


- 


97.12 


- 


- 


98.46 



proved to be efficient to solve using standard optimization 
techniques. One major contribution of the proposed QP-S"^ VM 
is that it estabhshes a Hnk between the two major paradigms 
of semi-supervised learning, namely low density separation 
methods and graph-based methods. Such link is considered 
a significant step towards a unifying framework for semi- 
supervised learning methods. Furthermore, we propose a novel 
formulation of the semi-supervised learning problems in terms 
of submodular set functions which is, up to the authors 
knowledge, is the first time such idea is presented. Using this 
new formulation we present a methodology to use submod- 
ular optimization techniques to efficiently solve the proposed 
QP-S'^VM problem. Finally, our idea of representing semi- 
supervised learning problems as submodular set functions will 
have a great impact on many learning schemes as it will open 
the door for using an arsenal of algorithms that have theoretical 
guarantees and efficient performance. The authors are already 
making progress in extending the presented work to multi- 
class semi-supervised formulations as well as examining the 
relationship between submodular optimization over different 
matroids and its interpretation in terms of semi-supervised 
learning. One last intriguing point about the proposed work is 
that samples are assigned to classes, in our case the positive 
class, sequentially. This opens the door for possible ways to 
estimate the ratio of positive samples r automatically during 
the learning process which is still a problem for most semi- 
supervised techniques specially if there exists a difference in 
the ratio r between the labeled and unlabeled samples. 

VI. Appendix 

A. Proof of Theorem [7] 

To get an upper bound for IduhI we divide it into several 
components as follows: 

iDuai-AAi+AAa+A/a (18) 

where A^i = A'l|£| - -(A o Y)'Kii(A o Y) 

AA2 = (r + Byi^ui - ^(r - B)'Kuu(r - b) 

AA3 = -(AoY)'K,u(r-B). 



(19) 



Then 



max Xouai < max A/i + m,ax N^ + inax As (20) 
A,B,r A B,r A,B,r 



m,ax Afi is the dual form of a standard supervised SVM 
problem using the label data, i.e. 



max J\fi =mm -\\w\\ +Cy Q 

A w 2 -^ — ' 

iec 



(21) 



Furthermore, using the value limits of A, B and F, i.e. < 
A < Cl|£|, < B < C*(l|w| - P) and < r < C*P, we 
can derive the following upper bounds of A/2 and A/3, 



maxAa < C*\U\ + -C*^ilim 



P)'KuuP (22) 



and 



max Afa < CC*Y'Kiu(l|;^| - P). (23) 

A,B,r 

Combining the three upper bounds we get the provided bound 
in the theorem. 

B. Proof of Theorem p] 

First, 5(0) = follows directly from the definition in 
Eqn.([T7]i where all the summations are on elements in the 
set A. Therefore if A ^ $ then 5(0) = 0. For the sake 
of simplicity we consider the special case where d = 1. 
However, the extension to the general values of d is fairly 
straightforward. Next we prove the monotonicity property. 
Using the definition of S{A), we can show that for any m eU 
and m, ^ A, the increase in the objective value of S due to 
adding m is, 

S{AUm) -S{A) = 
- \c*^ Y. [Kuu],„,,, + CC* Y. y, [K,J,_„ 

j'eA 
+ \c*^ ([K,u],„,,„ - 1) + \c*M + CC*\C\ 

Since we are examining the case where d = \, then 
1. Therefore, since 



< K,j < 1 and K 



y'{ 



K, 



1 



j'6-4 

CC*|£|+CC*^y,[K,u], 
iec 



>0 



(25) 



*2 V^ 



lc*'\u\>^-c* ^ 



:k,u]„,,, + c*^i^i 



then 



S{AUm)-S{A) > 



Thus the monotonicity property of S{A) holds true. 

Now we prove the submodularity of S{A) by assuming the 
set B = {A U q\ where q ^ U. Using the same set element 
m we used earlier, i.e. m €U and to ^ ^, we need to show 
that adding m to the set A has more effect than adding it to 
the set B as stated in Definition [T}b. Since 



S{B) 



ic*' 5] [K„J^.^.,+CC*5] y. [KiJ, 



]e{A\jq},]'eu 



je{A\jq},iec 



W'T. 



[K„ 



hj' 



],j'e{AUq} 
j,j'&{AUq} 



6,.r{lc*'\U\ + CC*\C\ 



\c- 



(26) 



then 

S{B U m) - S{B) = 

- 9^* / JKuu]^ .,v 



j'ew 



1 9 

2^ 



,-C*^{\A\ + l) 
3 



j'aiAVJq} 

Therefore 

{S{A U to) - 5(yl)) - {S{B U to) - 5(S)) 



(27) 



-C*^|Z^| + CC* |£| 



C*^ (1 - [K„, J^ „J > 
Hence the set function 5 (.4) is submodular. 



(28) 



[9] 
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